Discriminatively trained phoneme confusion model for keyword spotting
نویسندگان
چکیده
Keyword Spotting (KWS) aims at detecting speech segments that contain a given query within large amounts of audio data. Typically, a speech recognizer is involved in a first indexing step. One of the challenges of KWS is how to handle recognition errors and out-of-vocabulary (OOV) terms. This work proposes the use of discriminative training to construct a phoneme confusion model, which expands the phonemic index of a KWS system by adding phonemic variation to handle the abovementioned problems. The objective function that is optimized is the Figure of Merit (FOM), which is directly related to the KWS performance. The experiments conducted on English data sets show some improvement on the FOM and are promising for the use of such technique.
منابع مشابه
Keyword Spotting Based on Phoneme Confusion Matrix
For many practical applications of keyword spotting, input signal is a spontaneous conversation while the acoustic model was trained with read speech because of data availability. Generally speaking, keyword spotting system will degrade significantly because of mismatch between acoustic model and spontaneous speech. To solve this problem, this paper presents a two-pass keyword spotting strategy...
متن کاملKeyword Spotting in A-capella Singing
Keyword spotting (or spoken term detection) is an interesting task in Music Information Retrieval that can be applied to a number of problems. Its purposes include topical search and improvements for genre classification. Keyword spotting is a well-researched task on pure speech, but state-of-the-art approaches cannot be easily transferred to singing because phoneme durations have much higher v...
متن کاملPhoneme Based Acoustics Keyword Spotting in Informal Continuous Speech
This paper describes several ways of keywords spotting (KWS), based on Gaussian mixture (GM) hidden Markov modelling (HMM). Context-independent and dependent phoneme models are used in our system. The system was trained and evaluated on informal continuous speech. We used different complexities of KWS recognition networks and different types of phoneme models. The impact of these parameters on ...
متن کاملBootstrapping a System for Phoneme Recognition and Keyword Spotting in Unaccompanied Singing
Speech recognition in singing is still a largely unsolved problem. Acoustic models trained on speech usually produce unsatisfactory results when used for phoneme recognition in singing. On the flipside, there is no phonetically annotated singing data set that could be used to train more accurate acoustic models for this task. In this paper, we attempt to solve this problem using the DAMP data s...
متن کاملPhoneme-Lattice to Phoneme-Sequence Matching Algorithm Based on Dynamic Programming
A novel phoneme-lattice to phoneme-sequence matching algorithm based on dynamic programming is presented in this paper. Phoneme lattices have been shown to be a good choice to encode in a compact way alternative decoding hypotheses from a speech recognition system. These are typically used for the spoken term detection and keyword-spotting tasks, where a phoneme sequence query is matched to a r...
متن کامل